Classification of ncRNAs using position and size information in deep sequencing data

نویسندگان

  • Florian Erhard
  • Ralf Zimmer
چکیده

MOTIVATION Small non-coding RNAs (ncRNAs) play important roles in various cellular functions in all clades of life. With next-generation sequencing techniques, it has become possible to study ncRNAs in a high-throughput manner and by using specialized algorithms ncRNA classes such as miRNAs can be detected in deep sequencing data. Typically, such methods are targeted to a certain class of ncRNA. Many methods rely on RNA secondary structure prediction, which is not always accurate and not all ncRNA classes are characterized by a common secondary structure. Unbiased classification methods for ncRNAs could be important to improve accuracy and to detect new ncRNA classes in sequencing data. RESULTS Here, we present a scoring system called ALPS (alignment of pattern matrices score) that only uses primary information from a deep sequencing experiment, i.e. the relative positions and lengths of reads, to classify ncRNAs. ALPS makes no further assumptions, e.g. about common structural properties in the ncRNA class and is nevertheless able to identify ncRNA classes with high accuracy. Since ALPS is not designed to recognize a certain class of ncRNA, it can be used to detect novel ncRNA classes, as long as these unknown ncRNAs have a characteristic pattern of deep sequencing read lengths and positions. We evaluate our scoring system on publicly available deep sequencing data and show that it is able to classify known ncRNAs with high sensitivity and specificity. AVAILABILITY Calculated pattern matrices of the datasets hESC and EB are available at the project web site http://www.bio.ifi.lmu.de/ALPS. An implementation of the described method is available upon request from the authors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybridization-based reconstruction of small non-coding RNA transcripts from deep sequencing data

Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information conte...

متن کامل

Deep Profiling of the Novel Intermediate-Size Noncoding RNAs in Intraerythrocytic Plasmodium falciparum

Intermediate-size noncoding RNAs (is-ncRNAs) have been shown to play important regulatory roles in the development of several eukaryotic organisms. However, they have not been thoroughly explored in Plasmodium falciparum, which is the most virulent malaria parasite infecting human being. By using Illumina/Solexa paired-end sequencing of an is-ncRNA-specific library, we performed a systematic id...

متن کامل

Computational Approaches for the Analysis of ncRNA through Deep Sequencing Techniques

The majority of the human transcriptome is defined as non-coding RNA (ncRNA), since only a small fraction of human DNA encodes for proteins, as reported by the ENCODE project. Several distinct classes of ncRNAs, such as transfer RNA, microRNA, and long non-coding RNA, have been classified, each with its own three-dimensional folding and specific function. As ncRNAs are highly abundant in living...

متن کامل

Unexpected Diversity of Chloroplast Noncoding RNAs as Revealed by Deep Sequencing of the Arabidopsis Transcriptome

Noncoding RNAs (ncRNA) are widely expressed in both prokaryotes and eukaryotes. Eukaryotic ncRNAs are commonly micro- and small-interfering RNAs (18-25 nt) involved in posttranscriptional gene silencing, whereas prokaryotic ncRNAs vary in size and are involved in various aspects of gene regulation. Given the prokaryotic origin of organelles, the presence of ncRNAs might be expected; however, th...

متن کامل

CoRAL: predicting non-coding RNAs from small RNA-sequencing data

The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2010